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Abstract 

The determination of an isotope ratio by secondary ion mass spectrometry 
(SIMS) traditionally involves averaging a number of ratios collected over the 
course of a measurement. We show that this method leads to an additive 
positive bias in the expectation value of the estimated ratio that is approx- 
imately equal to the true ratio divided by the counts of the denominator 
isotope of an individual ratio. This bias does not decrease as the number of 
ratios used in the average increases. By summing all counts in the numera- 
tor isotope, then dividing by the sum of counts in the denominator isotope, 
the estimated ratio is less biased: the bias is approximately equal to the 
ratio divided by the summed counts of the denominator isotope over the 
entire measurement. We propose a third ratio estimator (Beale's estimator) 
that can be used when the bias from the summed counts is unacceptably 
large for the hypothesis being tested. We derive expressions for the vari- 
ance of these ratio estimators as well as the conditions under which they are 
normally distributed. Finally, we investigate a SIMS dataset showing the 
effects of ratio bias, and discuss proper ratio estimation for SIMS analysis. 
Keywords: 

secondary ion mass spectrometry, ratio, isotope measurement, statistics, 
bias 



1. Introduction 



Ratio estimation is broadly used in all scientific disciplines. Unfortu- 
nately, the mathematical issues associated with calculating meaningful ra- 
tios from experimental data are frequently ignored or misunderstood. In 
experiments dealing with blood sera in 1909, Greenwood & White [1] ob- 
served that a distribution of ratios (calculated as a mean of a number of 
individual ratios), randomly generated, was positively skewed. Karl Pear- 
son explained the effect mathematically [2]. The statistical properties of 
ratio estimation were well-explored in the 1950s and 1960s (e.g. [3], jl], 
[5]), and these methods have been applied sparingly to various disciplines 



The crux of the problem of ratio estimation can be understood by con- 
sidering a statistical variate z defined in the range < z < oo that has a 
probability p(^) of taking on the value z^. Assuming the expectation values 
of z and 1/z, E{z} and E{l/z}, exist, we can calculate: 



If z takes on just one value Zj (such that the probability p(zj) = 1), this 
reduces to: 



(e.g. ®, ®). 



E{z}E{l/z} = J2^ P(*) ~ ^ 




all i all k 



E{z}E{l/z} = ( Zj p( % )) (i p( Zj )) = p(^) 2 = 1 



(2) 
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If z takes on more than one value (z±, z 2 ,...), we expand the terms in Equa- 
tion m 



E{z}E{l/z} = ( Zl p( Zl ) + z 2 p(z 2 ) + ...)(- p(*i) + - P(z 2 ) + ... ) (3) 

Z\ z 2 J 



Collecting the squared terms and the cross terms for all possible values of 



z: 



E{z}E{i/ Z }= nr P (^) 2 ) 

Vail k J 



Vail m, n 



1 ) P{Zm)V{Z ri 

Zn Z m 



(4) 



/ 



We wish to compare this with 1, which we can write as the square of the 
sum of all possible probabilities p(zj): 



/ 



1= £p(*) Ep(^) = Ep( 



Zk) 



K all i 



. all i 



v all k 



^2 2p(z m )p(z ri 



all m, n 



\ 



J 



(5) 



Using the fact: 



— + — ) > 2 for z m , z n > and z m ^ z n 

Zn Z m 



(6) 



we see that if z that takes on more than one value, Equation [4] has a larger 
value than Equation [5] and: 



E{z}E{l/z} > 1 



(7) 
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Therefore we have shown that if z is not single- valued: 



E{l/z} > 1/E{z} 



8 



as also given in |8]. It follows that if z is an unbiased estimator of some 
quantity 6,1/ z cannot be an unbiased estimator of 1/9 [8]. For two positive, 
statistically independent random variables X and Y: 



showing that this ratio estimator is biased: an experiment attempting to 
measure E{Y}/E{X} using E{Y/X} will have an expectation value larger 
than the true value. 

The current methods for determining isotope ratios in natural samples 
by secondary-ion mass spectrometry (SIMS) have their roots in other types 
of mass spectrometry. In mass spectrometry, it is typically harder to de- 
termine the absolute abundance of an element than it is to determine the 
relative abundances of the isotopes of that element. This naturally leads to 
reporting isotope data as ratios, eliminating the need to know the absolute 
abundance of any species. A critical analytical problem is variation in the 
strength of the ion beam in the mass spectrometer. If the mass spectrometer 
has only one collector, it is important to account for changes in ion-beam 
strength in order to get accurate isotope ratios. Time interpolation is widely 
used to account for slow drifts in ion-beam strength. Isotopes are measured 
in sequence (e.g., 24 Mg, 25 Mg, 26 Mg, 24 Mg, ...) such that each isotope 
samples the time variation in the ion-beam strength. To correct for time 



E{Y/X} = E{Y}E{1/X} > E{Y}/E{X} 



(9) 
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drift, the signal for one isotope (e.g., 24 Mg) is interpolated between two 
measurements to give the signal at the time when another isotope was mea- 
sured. The isotope ratio is then calculated for each cycle and the individual 
ratios are averaged to give the final result. By collecting data as a series of 
ratios, one can also identify noise bursts or other problems that can affect 
the data, and cycles affected by the problem can be eliminated from the 
measurement. If count rates are so low that an isotope gets zero counts in 
a cycle, calculating by-cycle ratios does not work (i.e., a ratio with a zero 
denominator is meaningless). In this case, SIMS analysts typically add up 
all of the counts for each isotope from all of the cycles and calculate a single 
ratio for the measurement. It is widely believed that the two methods, using 
the mean of the individual ratios to calculate the final result and using the 
ratio of the total counts for each isotope to calculate the final result, give 
the same answer. But this is not the case. 

In this paper, we attempt to understand how ratio-estimator bias affects 
SIMS analysis and provide a framework for calculating isotope ratios with 
less bias. We will look at three ratio estimators and investigate the first 
four statistical moments of each estimator: the mean, variance, skewness, 
and kurtosis. From these statistical moments we can determine the bias 
(from the first statistical moment, the expectation value), efficiency (from 
the second statistical moment, the variance), and approach to normality 
(from the third and fourth statistical moment, the skewness and kurtosis) 
of each ratio estimator. A specific example of how bias affects SIMS data 
will be discussed. 
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2. First statistical moment of ratio estimators ri and r 2 : expec- 
tation value 



We assume two populations X and Y from which n subsamples x and 
y are randomly drawn. Our goal is to measure the ratio of the population 
means: 

R== (10) 
X V 1 

We can immediately think of two ways to estimate R given x and y. The 
ratio of the sample means: 

1 n 

- > Xi 

n 

and the mean of the sample ratios: 

i=l 1 i=l 

Under most circumstances in mass spectrometry, r 2 is employed as the ratio 
estimator. 

In this section we wish to calculate the first statistical moment (the 
expectation value) of each of these ratios (E{r±} and E{r 2 }) to investigate 
the accuracy of the ratio estimators. 
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Following [J], we can rewrite ri as: 



r 1 = R 



1 + 



y-Y - 

Y 



1 + 



x -X" 
X 



(13) 



where x and y are the sample means, and X and Y are the population 
means. The factor 



1 + 



x-X 



X 



-i 



(14) 



will converge as a geometric series if 



x-X 



X 



< 1 



(15) 



which is true under the assumption that x's are positive (in SIMS data, this 
is equivalent to nonzero counts), and n (the number of measurement cycles) 
is sufficiently large so that x < 2X (which is the case in any SIMS dataset 
where a reasonable measurement is sought). 

With these assumptions we can expand Equation 13 as a geometric 
series: 



n =R 



-R 



1 + 



1 + 



{x-X) (x-X) 2 (x-X) 3 

1 = I" 9 •} ' 



y-Y 

Y 

y-Y (x-X) (x-X)(y-Y) (x -Xf 



X 



X 



X 



Y 



X 



XY 



X' 



+ 



(x-Xf(y-Y) (x-X) 3 (x-X) 3 (y-_F) (x - X) 4 



X 2 Y 



X 



xY 



x 



(16) 



At this point we assume that our SIMS measurements, the sampled isotope 
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counts x and y, follow two Poisson distributions with expected values equal 
to the population means X and Y. The probability that k counts of x occur 
in a single sampling is X k e~ x /k\; the probability that k counts of y occur 
in a single sampling is Y k e~ Y jk\. For now we will assume X and Y are not 
independent. 

Before we calculate E{ri}, we note the following equalities (some of 
which are from [8]): 

E{(x-X)} = E{(y-Y)} = 
E{(x-X) 2 } = S 2 (x) =X/n 
E{(x-X)(y-Y)} = S(x,y) = S(x,y)/n 

E{{x-X)\y-Y)} = (S(x 2 ,y)-2XS(x,y))/n 2 (17) 

E{(x - X) 3 (y -¥)} = 3XS(x, y)/n 2 + Oin^) 

E{(x -X) 3 } = X/n 2 

E{(x - X) 4 } = 3X 2 /n 2 + 0(n- 3 ) 

The population variance of x is denoted by S 2 (x), and the population co- 
variance of x and y is denoted by S(x,y). 

We can now calculate E{ri} by substituting these expressions into Equa- 
tion 



16. Keeping terms to order 1/n (i.e., 0(n )): 



E{ ri } rj R 



1 1 1 (L s ^i ] \ i 1 ( 2 (2 1 3 ^ ' s{x2 > y) 



n \X XY J n 2 \x 2 XY \ X) x 2 Y 
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For independent X and Y this reduces to: 



E{n} « R 



1 + 



1 



+ 



nX n 2 X z \ 



(19) 



This shows that r±, the ratio of the means, is a biased estimator of the ratio, 
R, consistent with Equation [9] 
For r 2 we note: 



E{r 2 } = e\-Y^ P \ = - {E{ Pl } + E{p 2 } + ... + E{p n }) 



-(nE{ Pl }) = E{ Pt } = E\^ 

n Xi 



(20) 



That is, the expectation value of the mean of the ratios is equal to the 
expectation value of a single ratio. We choose a ratio pi for which the 



inequality in Equation 15 holds (in a reasonable SIMS sample set, there 



would be at least one ratio that satisfies this constraint), and then we can 



expand r 2 in a power series and obtain an equation like Equation 16, with 



individual Xi and yi replacing the sample means x and y. Then, to obtain 

we 



analogous expressions for the expectation values given in Equation 17 



substitute n = 1 and keep terms to 0(X ). This yields the expectation 



value for r 2 analogous to Equation 18 



E{r 2 } w R 



1 + = + 



x T 



3S(x,y) 
IF 



(21) 



For independent X and Y this reduces to: 



E{r 2 } w R 



1 



(22) 



X 



which shows that r 2 , the mean of the ratios, is also a biased estimator of R, 
but with a bias that is independent of the number of cycles n. Because of 
this, the bias of r 2 is usually much larger than ri. This arises from Equation 



20] the first-order bias term for ri, which is proportional to the variance of 
the mean of the denominator counts for n cycles, becomes proportional to 
the variance of an individual measurement Xi for r 2 . Since the variance of 
the mean of the counts over n cycles is equal to the variance of an individual 
measurement divided by n, the first-order bias term of T\ is a factor of 1/n 
smaller than the first-order bias term of r 2 . 

The important conclusion is that the expectation values of r\ and r 2 are 
both strictly greater than R: 

E ^> E M-l- R (23) 

The difference between the expectation value of the ratio estimators r\ 
or r 2 and the actual ratio R is the bias: 



B{n} =E{n} - R^ R 

B{r 2 } =E{r 2 } -R^R 



1 2 

+ 

nX n 2 X 
1 2 



(24) 



for independent x and y. The bias of r x decreases as the total number of 
denominator counts in the entire measurement nX increases, but the bias 
of r 2 only decreases as the expected number of counts per cycle X increases, 
independent of the total number of cycles n. 

Ratio estimators can be constructed with smaller bias than r±. Tin [1] 
looked at four ratio estimators and found that Beale's ratio estimator [9] has 
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the least bias for finite and infinite source populations with x and y following 
a bivariate normal distribution. Additionally, Tin [I] found that Beale's 
estimator performed well for small samples, e.g. n = 50. Srivastva et al. 
[TO] confirmed that Beale's estimator outperformed the standard estimator 
ri in a bivariate normal model. Monte Carlo simulations by Hutchison 
[TT] of various ratio estimators and Poisson-distributed random variables 
showed that Beale's estimator performed best for small n and small X. We 
choose Beale's estimator for these reasons and also because of its simple 
structure and accuracy compared to the other estimators on very small 
and/or unusual data sets [I]. 

Beale's ratio estimator for R = Y/X, assuming infinite source popula- 
tions, from which x and y are drawn, is: 

/ cov(x, y) \ 
\ nxy J 

r 3 = r 1 ±- / fS 25 

var(x) 

nx 

where n is the number of sample pairs Xi and yi, and: 
1 n 

cov(x, y) = S~] (xi - x) ^ - y) (26) 

n — 1 

i=l 



1 - 



varlx) = > [Xi — x) 

n 

i=l 



(27) 



are the sample covariance and sample variance, respectively. 



The expectation value of r^, analogous to Equations 18 and 21 



is com- 
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puted by Tin [I]. For our Poisson-distributed x and y and infinite source 
population, we find that Beale's estimator is unbiased to order 0(n~ 2 ). 
Beale's estimator, therefore, appears to perform much better than the stan- 
dard ratio estimators r% and r 2 for our circumstances. The order 0(n~ 3 ) 
terms of r 3 assuming Poisson distributions are very complicated, but we note 
that the Poisson distribution is well-approximated by the normal (Gaus- 
sian) distribution for large mean values. The theoretical expectation value 
of r 3 assuming x and y are normally distributed only differs by 0.01% when 
X = 10 from the expectation value assuming Poisson-distributed x and y. 
We report the expectation value of r 3 for X and Y larger than 10 using a 
normal approximation to order 0(n~ 3 ) as deduced by Tin jl] and Srivastva 
et al. [ID]: 



For normally distributed and independent X and Y, the bias of r 3 i 



Unlike r\ and r 2 , the bias of r 3 is negative. 

2.1. Computer simulations of the expectation values of r\, r 2 , and 

The bias in these ratio estimators can be shown to approximately agree 
with the above equations by simulating a large number of experiments com- 
putationally. We calculate r 1; r 2 , and r 3 using Poisson-distributed counts 
randomly generated by poissrnd in MATLAB version 7.10.0. For simplicity, 
we assume R—Y/X — l for all computer simulations (the results are the 




(28) 




(29) 
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same for all values of R). For each X and number of cycles n = 10, 100, 1000, 
we generate 10 9 , 10 8 , and 10 7 sets of simulated counts of x and y drawn from 
their associated Poisson distributions. For each of these sets, we calculate 
the ratio estimator r\, r 2 , and r%. We use the maximum likelihood estima- 
tor mle in MATLAB to determine the mean of the 10 9 , 10 8 , and 10 7 sets (for 
a given X and n) and 95% confidence interval of the mean. We compare 
these simulated values to the derived equations for the biases B{r 1^,3}- 

For moderately large n and X, we approximate the Poisson distribution 
by a normal distribution, which greatly speeds up the random-number gen- 
eration in MATLAB. To estimate the smallest biases, such as those for r 3 , we 
make a further assumption that the means of the counts follow a normal 
distribution, as do the sample variances and covariances of these counts [4J. 
The computer simulations closely follow the theoretical values, as shown in 
Figure [TJ 

2.2. Time-varying count rate in SIMS data 

The count rate of individual isotopes in a SIMS measurement can vary by 
large amounts over the course of a measurement. One contributing factor 
to this phenomenon is variation in primary beam intensity. In addition, 
as more primary ions become implanted into the sample, the ionization 
efficiency increases, and the measured count rate in the electron multiplier 
or Faraday cup increases. If these types of experimental variations are large 
compared to the statistical variance of the data, the isotope counts show 
a strong sample covariance. However, the statistical behavior of the data 
actually reflects a time-varying population mean: X and Y are no longer 
constant, but are time-dependent, and each subsample X{ and yi samples 
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from X(t) and Y(t) over the n cycles of the measurement. 

We model this phenomenon with computer simulations by assuming the 
count rate changes (over a range from ~10% to several hundred percent) 
for the numerator and denominator over the course of the measurement fol- 
lowing a sigmoidal function (continuously increasing) or various sinusoidal 
functions (increasing and decreasing). Our simulations show that the ex- 
pectation value of r\ and r 3 for these types of measurements should be 
calculated by substituting the measurement-time average of X(t) for X in 



Equation 19 and Equation 28 The analogous expectation value of r 2 is 
computed differently: the time- varying expectation value is calculated from 



Equation 22 by substituting X(t) for X, and then the expectation value 
over the entire measurement is deduced by computing the average of this 
varying expectation value. The variances of r 1; r 2 , and r 3 can be calculated 
in a similar way using the equations of Section [3j Using these methods, the 
statistical properties of a measurement with significant time-dependence 
can be understood. 



3. Second statistical moment of ratio estimators r l5 r 2 , r s : vari- 
ance 

We wish to measure a ratio (biased, as it is) as precisely as possible given 
a fixed number of cycles and isotope counts per cycle. The most efficient 
ratio estimator will have the lowest statistical variance, given by the second 
statistical moment of the ratio estimator. 

The variance of r 1; the ratio of the means, is: 



V{n} = E{{r 1 - E{ n }) 2 } = E{rl] - E{ n } 2 
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(30) 



Expanding Equations [16] and 18 while substituting the expressions in Equa- 
tion 17, and keeping terms of 0(n~ 2 )): 



V{ ri } ^R 2 



nil Y 



2Sjx 2 y) 

IF 



n 2 \ x 2 



6 3 a I a ( 4 

h = + Six, y) 

XY V 1 \Y 



XY x 2 Y 



16 5g(x,y) 

~r 2 2 

XT 



45(a; 2 ,|/) 2S(x,y 2 ) 



X 2 Y 



XY 



which in the case of independent X and Y reduces to: 



(31) 



V{n} « # 2 



n \X 



1 

F 



V,x 2 



+ 



XY 



(32) 



Since r2 is the mean of n ratios, we can write its variance, assuming the 
individual ratios yi/xi are uncorrelated, as: 



V{r 2 } = V 



n- 



^ i ^ x, l n 2 ^ V 



i=i 



i=l 



-F 

n I x 



j 

(33) 



If the individual ratios yi/xi are not independent: 



V{r 2 } 



1 ( (n - l)p\ j //, 



n 



r? 



(34) 



where p is the average covariance between individual ratios. However, in 
the application of SIMS measurements, the individual ratios will in general 



be uncorrelated. We substitute Equation 31 with n — 1 into Equation 33 
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and keep terms to 0(X ) which yields an approximation for the variance 
of r 2 : 



V{r 2 } 



R 2 



n 



1_ 1_ WS&y) 6 

X Y IF x 2 XY 



3 4S(x,y) 



y 



(35) 



In the case of independent X and Y, this reduces to: 



V{r 2 } 



R 2 



?? 



1 1 

T + ¥ x 



6 3 

=2 + — 



xy 



(36) 



For Beale's estimator r%, we assume the general case where x and y are 
Poisson-distributed. The variance of r% to order 0(n~ 2 ) as derived by Tin 
[4j is: 



V{r 3 } « i? 2 



l/l_ 1 2g^y) \ 1 f 2 4g(g,y) , Sfoy) 2 , 1 
n \X + Y IF 1 : ' ' -' -'- : -' — 



n 2 \x 2 X 2 Y 



X 2 Y 2 



XY 



which in the case of independent X and Y reduces to: 



(37) 



V{r 3 } « R 2 



1 1 

// \T + Y 



n Vx 



xy 



(3* 



For a large number of counts, the 0(n 1 ) approximation shows that all 
three ratio estimators ri, f2, ?"3 have equal variance: 



1 f £ £ 2^7/) 

n vx y xy 



(39) 



which is the expected and familiar result [12 
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3.1. Computer simulations of the variances of r\, r2, and r 3 

Similarly, the variances V{ri i 2,3} for the derived equations and computer 
simulations (calculated with 10 6 sets) are shown in Figure [2] For more than 
a few tens of counts per cycle, the variance follows the familiar expression 



given in Equation 39 The variance of the mean of the ratios, r 2 , is substan- 



tially more than this for less than a few tens of counts per cycle. If a la 
uncertainty for r 2 is computed from the square root of the approximation 



given in Equation 39 , this value will significantly underestimate the actual 
uncertainty of the r 2 ratio estimator. The variance of the ratio of means, 



ri, is very slightly larger than Equation 39 for very low counts 



4. Third and fourth statistical moment of ratio estimators of ri, 
r ii r 3 : skewness and kurtosis 

The third and fourth statistical moments, the skewness (71) and kurtosis 
(72), determine if a random variate is approximately normally distributed. 
Assuming normally distributed x and y, following Tin [4], to first order in 
1/n the skewness of the ratio estimators 7*1 and r 3 are: 



71 {n} 



( 



EUn-Ein}) 3 } 



(V{n}f 2 



\ 



/ 9 — o — \ 1 / 2 

nXY Q 2 + X Y 



6 + 44 + 



1 



YQ?/X + 1 J nX 



(40) 



7i W 



E{(r 3 -E{r 3 }) 3 } 
(V{r 3 }f 2 
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(41) 



/ 
\ 

with: 



YQ 



\ 



(nX Y 2 n 2 + X 2 ^ 2 



6 + 26 - 



YQ 2 /X 



I J nX J 



n=(l-XS(x,y)) 



The skewness of r 2 can be found (analogous to our derivation of the 
variance of r 2 ) by setting n — 1 in the expression for 71 {ri} and then 
multiplying this expression by 1/ \Jn 



*> ^ ( ;:y^ (42) 

{V{r 2 }) 

( 



Yn 



6 + 44 + 



\{nXY^ + nX 2 Yy 2 j^ \ YW/X+lJX 

However, this expression underestimates the skewness of r% for X less 
than a few hundred counts. This is because there are more terms in the 



2 3 

right-hand factor of Equation 42 (of order 1/X , 1/X ,...) that increase the 



skewness of r 2 quickly for small X. Computer simulations show that the 
— 2 

1/X term is important for predicting the skewness of r 2 down to tens of 
counts in X. 

We calculate the kurtosis of the ratio estimators, again following Tin |4], 

and find that similar to the second and third statistical moments (variance 

and skewness), the kurtosis of r 2 is very close to the kurtosis of r\ and 

down to a few tens of counts in X. Below that, the kurtosis of r 2 increases 
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extremely rapidly for n = 1000, 100, 10 as shown by computer simulations. 
We do not report details of the kurtosis calculations or simulations here as 
they are similar to the results for skewness of the ratio estimators. 



4-1. Computer simulations of the skewness of r\, r 2 , and r^ 

A normal distribution of x and y was assumed. Simulated random counts 
were created in MATLAB as for the simulations of expectation value and vari- 



ance (Sections 2.1 and 3.1), and the skewness and kurtosis of 10 4 sets of 
ratios for each ratio estimator r 1; r 2 , and r 3 were computed by MATLAB's 
skewness and kurtosis functions. The mean and its 95% confidence in- 
terval of 1000 kurtosis and skewness values were computed using MATLAB's 
mle function. The results of the computer simulations for skewness of the 
three ratio estimators are shown in Figure [3] 

4-2. Approach to normality of ' r\, r 2 , and r^ 

Ratio estimators in SIMS analysis are typically treated as variates fol- 
lowing a normal distribution. Such a treatment greatly simplifies the task 
of calculating the statistical properties of a measurement, such as its sta- 
tistical uncertainty. However, if the underlying distribution of the variate 
is not normal, it is not valid to treat it as such. Here we investigate under 
what conditions of the average number of counts per cycle (X) and number 
of cycles (n) this assumption is valid for the three ratio estimators r%, r 2 , 
and r 3 . 

The Jarque-Bera hypothesis test uses the sample kurtosis and skewness 
to determine the departure from normality of a given distribution [13J. The 
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test statistic is: 



JB 



1 
6 




(72 - 3) 2 



4 



) 



(43) 



where 77 is the sample size, 71 is the skewness and 72 is the kurtosis. We 
employ MATLAB's jbtest function to determine when a ratio estimator (7*1, 
r 2 , or r 3 ) with a given X and n is normally distributed. If there is less 
than a 5% chance that the data is normally distributed, jbtest returns 
one, otherwise it returns zero. For 77 > 2000 the statistic follows a \ 2 dis- 
tribution, for smaller 77, jbtest uses results from Monte-Carlo simulations. 
We performed the Jarque-Bera test 1000 times on each set of 77 = 10 4 ratios 
of ri, f2, and r 3 for a given X and n. From these 1000 tests, we calculated 
the mean of the Jarque-Bera test; results are shown in Figure |4| 

Where the Jarque-Bera test returns zero, the distribution is normal. We 
assign X norm to the number of counts per cycle where the average of the 
1000 Jarque-Bera tests falls below 0.5 in our simulations. At this number 
of counts per cycle for a given number of cycles n, the ratio estimator can 
be considered normal, and standard normal statistics can be applied. The 
Xnorm values for the three ratio estimators and n = 10, 100, 1000 are given 
in Tabled] 

5. An example analysis of SIMS data using ratio estimators r±, 
r 2 , and r 3 

It is important to note that the biases derived in Section [2] are for the 
expectation values of the ratio estimators: it is not true that for any given 
measurement r 2 > 77 > r 3 , only that the long-run average (if the measure- 
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76 
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64 
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1000 


<10 




1000 


30 


r 3 


1000 


<10 



Tabic 1: The minimum values required for normality of the three ratio estimators for 
number of cycles = 10, 100, 1000 as determined by computer simulations and the Jarque- 
Bera test. 

ment is performed an infinite number of times) of these ratio estimators will 
show this behavior. If the bias is small compared to the variance of a given 
measurement, the calculated ratio estimators have a reasonably high prob- 
ability of not being ordered as r 2 > T\ > r 3 . Simply subtracting the bias 
from T2 to reduce the bias in the calculated ratio is not a good strategy: the 
bias depends on the population mean X which is estimated in an unbiased 
way by x. However x has a variance of X/n which means the estimation 
of the bias is uncertain, and the experimenter is better off using a ratio 
estimator with lower bias to begin with. 

As an example of the dangers of using the ratio estimator r 2 , we look at 
secondary ion mass spectrometry (SIMS) isotope measurements of troilite 
(FeS) in the LL3.1 ordinary chondrite Krymka. The presence of extinct 
short-lived radionuclides in meteoritic material can be explained by pro- 
duction due to energetic particle spallation or contribution from a nearby 
stellar source, such as a supernova. However, 60 Fe is not efficiently produced 
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by spallation, so the presence of its daughter 60 Ni would be evidence of a 
stellar source. The Krymka meteorite has experienced little metamorphism 
to disturb the Fe-Ni isotopic system. When 60 Ni/ 61 Ni is plotted against 
56 Fe/ 61 Ni, the SIMS measurements will fall on a line (an isochron), with 
slope equal to the initial abundance of 60 Fe/ 56 Fe. Measurements of the Fe- 
Ni system in Krymka troilite were acquired on a Cameca ims 6f at Arizona 
State University. The ratios 60 Ni/ 61 Ni and 56 Fe/ 61 Ni were initially calcu- 
lated using r 2 , the mean of ratios, for 100 or 200 cycles with 15-100 counts 
per cycle of 61 Ni (X). Using r 2 , the inferred initial abundance of 60 Fe/ 56 Fe 
in troilite was (1.0-1.8) x 10 -7 [H]. When these ratios are calculated using 
ri, the ratio of means, this detection goes away, and the initial abundance 
of 60 Fe/ 56 Fe in the Krymka troilite grains is no longer resolved from zero 
(see Figure [5]) . 

6. Proper ratio calculations in mass spectrometry 

We have shown that the ratio estimator r 2 , the mean of individually 
measured ratios, is poorly behaved in its first four statistical moments, es- 
pecially with low counts. Most importantly, its expectation value is heavily 
biased and this bias is independent of the number of cycles over which the 
mean of the ratios is calculated. The bias of r 2 can be as much as several 
percent for tens of counts in the denominator (reference) isotope. The vari- 
ance of r 2 can be significantly worse than other ratio estimators for tens 
of isotope counts, resulting in much larger uncertainties on the measured 
ratio. Additionally, r 2 cannot be treated as a normally distributed variate 
for ~30 counts per cycle unless ~1000 cycles are measured, a factor of ~3 

more cycles than are needed for the two other ratio estimators. 
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The bias of r 2 is strictly positive; it cannot be thought of as another 
source of random error. It simply results in a calculated ratio higher than 
the actual source ratio. In many experiments, calculating r 2 instead of r\ or 
r3 is experimentally easier because instrumental conditions change over the 
course of the measurement. Time interpolation of the ratios is a standard 
method of dealing with this problem. However, one can time-interpolate 
the ion signals without calculating individual ratios for each cycle. For 
example, if one wishes to calculate r\, the ratio of the mean count rates, one 
can interpolate the signal for the denominator such that each measurement 
is modeled to have occurred at the same time as the measurement of the 
numerator. The mean count rates can then be determined for the measured 
signal in the numerator and the calculated signal in the denominator, and 
the final ratio r\ can be calculated. 

In experiments where the ratio estimator r 2 , the mean of ratios, must 
be used instead of r%, the ratio of means, the effect of ratio bias on the 
final result of the measurement can be understood in terms of the relative 
bias of the ratio divided by the relative standard deviation of the ratio. For 
r 2 , the relative bias of the ratio is independent of the number of cycles n, 
but the relative statistical standard deviation decreases as n increases. If 
the expected bias is much smaller than the statistical uncertainty of the 
measurement, the bias can be ignored. We define the relative bias of the 
ratio divided by the relative standard deviation (the square root of the 



variance) for r 2 . Using Equations 24 and 36 and keeping only first order 
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terms: 



B{r 2 }/R ^ = / nR 



(44) 



For the ratio bias associated with using r 2 to be insignificant, we require 
this quantity to be small. For example, if we require the bias to be less 
than 10% of the statistical standard deviation of the measurement, we can 
deduce an upper bound on how many cycles n can be used to calculate r 2 : 

Cx + y~\ 

n < ± '- (45) 

100 R K J 

That is, for the ratio bias associated with using r 2 (the mean of ratios) to 
be insignificant relative to the statistical uncertainty of the measurement, 
the number of cycles (n) must be smaller than the sum of expected counts 
per cycle (X+Y) divided by 100 times the expected ratio (R). 

In multi-collector inductively coupled plasma mass spectrometry (MC- 
ICP-MS), count rates are typically higher than SIMS and, correspondingly, 
the measured ratios are usually more precise. Isotopes of Pb are measured 
using MC-ICP-MS with a precision of ~50 ppm or better (e.g. [15]). The 
ratio estimator r 2 is typically used in ICP-MS. In the case of Pb, the counts 
of all the isotopes in the collectors are so high (X, Y ~ 10 9 ) that the ratio 



bias from calculating r 2 over ~100 cycles is insignificant (Equations 44 and 



45) 



In studies of radioactive nuclides and their daughter isotopes using SIMS, 
the daughter isotope is often in the numerator, and a high calculated ratio 
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is often a positive detection and of scientific interest. A significant positive 
bias from calculating the ratio as r 2 is therefore potentially misleading and 
should be avoided whenever possible. The final ratio should only be calcu- 
lated as r2 if the bias can be estimated to be sufficiently small compared to 



the statistical uncertainty of the measurement (Equations 44 and 45) and 
any additional systematic uncertainty resulting from the use of r\ or r 3 is 
significant. 

The ratio of means, r\ is also biased but the bias tends to zero as the 
number of cycles n goes to infinity. For certain high-accuracy measurements, 
the bias of r 1; the ratio of the means, may still be too large to test the 
hypothesis at hand with sufficiently high confidence. In this case, Beale's 
estimator r 3 should be used, which has ~1% or less of the bias of n, and 
similar variance and approach to normality. 
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Figure 1: The calculated bias of ri, r 2 , and r 3 (Equations 24 and 29) as a function of 
counts in the denominator for n=10, 100, and 1000 cycles are shown by curves. (The bias 
of estimator r 2 is independent of the number of samples n.) The computer simulations 
for Poisson-distributed x and y are given for each ratio estimator and n by squares, the 
95% confidence interval of the bias are given by vertical lines when the uncertainty is 
larger than the size of the marker. Circles indicate computations where x and y are 
normally distributed, triangles indicate computations where x, and y, as well as sample 
variances and covariances, were drawn from a normal distribution. Contributions from 
terms 0(X ) and smaller are evident for B{r 2 } near X = 10 where simulation points 
are slightly more biased than the line of calculated bias. For small X, calculation of r 2 
results in divide-by-zero, so those values are excluded. Since the bias of 7-3 is negative, 
the opposite of B{r^} is plotted. 
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Figure 2: The calculated variance of r%, and r% (Equations [32| [36| [38| divided by R 2 
as a function of counts in the denominator for 10, 100, and 1000 cycles are shown by 
curves. The computer simulations for Poisson-distributed x and y are given for each ratio 
estimator and n by squares, the 95% confidence interval of the variance is smaller than 
the size of the marker in all cases. More simulations were run for tens of counts in X to 

3 

show the deviation of variance in T2 from r\ and r%. Contributions from terms 0(X ) 
and smaller are evident for V{r2}/R 2 near X = 10 and n = 10 where the simulation 
points show slightly higher variance than the line of calculated variance. For X smaller 
than about 10, calculation of ri results in divide-by-zero, so those values are excluded 
from the plot. The calculated variance and computer simulations for are very close to 
rz and are thus obscured in this plot. 
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Figure 3: The calculated skewness of n, r 2 , and r 3 (Equations 40 42 and 41) as a 
function of counts in the denominator X (equal to counts in the numerator Y for these 
calculations) for 10, 100, and 1000 samples are given by curves. An approximation for 

2 

the 1/X term is included for the calculated skewness of r 2 . The computer simulations 
for normally distributed x and y (so X > 10) are given for each ratio estimator and n 
by circles, the 95% confidence interval of the skewnesses are given by error bars for some 
values of r%, the uncertainties for n and r% are always smaller than the size of the marker. 
For small X, calculation of r 2 results in divide-by-zero, so those values are excluded. 
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Figure 4: The average Jarque-Bera test for 1000 simulations of r\, r%, and r^, with 
Poisson-distributed counts, as a function of average counts in the denominator X. A 
value of zero means the null hypothesis "the data is normally distributed" cannot be 
rejected at the 5% significance level, a value of one means this hypothesis can be rejected 
at the 5% significance level. The X value at which the curve becomes less than 0.5 is the 
number of counts we require for a given ratio estimator (with a given n) to be considered 
normally distributed. Values for n are similar to values for r% so are mostly obscured. 
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Figure 5: Fe-Ni isotopic measurements of troilite (FeS) in the Krymka LL3.1 ordinary 
chondrite made by a Cameca ims 6f secondary ion mass spectrometer (SIMS) at Ari- 
zona State University [14] . Shown are two ratio estimators: r 2 (filled circles; data from 
[13]) and ?"i (unfilled squares). Lines are fit to the data points using weighted total 
least-squares, the slope of which is equal to the inferred initial 60 Fe/ 56 Fe in the troilite. 
A significant bias is associated with analyzing the data using the mean of ratios (ra), 
resulting in a false detection of extinct 60 Fe. This false detection disappears when a 
less-biased ratio estimator, r 1; is used on the same data. Isotopes were measured for 100 
or 200 cycles (n) with 15-100 counts per cycle of 61 Ni (X). 
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